Introduction to the USTerritoryMapping Package

Introduction

This vignette gives an overview of the USTerritoryMapping R package, which seeks to make creating categorical choropleth maps of the US that include the US territories a little bit easier!

First load the package.

library(USTerritoryMapping)

State & Territory-Level Mapping

Prepare Data

To use this package, you will need to have a data frame with two columns:

  1. a categorical variable coded as a factor
  2. the two-letter US Postal Service code for each state and territory (e.g., VA for Virginia or VI for Virgin Islands). fipscodes.rda is provided to facilitate #2.

For this vignette, we’ll be using the two provided datasets census.uninsured19 and cdc.cvd. census.uninsured19 provides an example of a dataset with complete data for all 50 states, D.C., and the 5 US territories. It is already in the proper format for the provided package functions.

cdc.cvd is missing values for territories and requires additional processing which we will demonstrate below.

data(census.uninsured19)
data(cdc.cvd)

We can see that census.uninsured19 has these two components: 1. “Percent.Cat”: the Percentage Ages 19 or Under with No Health Insurance categorized as a factor 2. “STUSPS”: the two letter US Postal Service code

class(census.uninsured19$Percent.Cat)
#> [1] "factor"
table(census.uninsured19$Percent.Cat)
#> 
#>   Less than 5%     5% to <10% 10% or Greater 
#>             29             22              5
head(census.uninsured19$STUSPS)
#> [1] "PA" "CA" "WV" "UT" "NY" "DC"

In cdc.cvd we are missing the territories and our fill variable (Data_Value) has not yet been prepared as a factor.

We’ll first join the dataset to the provided fips_code dataset to get the full list of jurisdictions. Then we’ll code a new factor variable for mapping.

data("fips_codes_state")

cdc.cvd <- fips_codes_state %>%
              left_join(cdc.cvd, by = c("state" = "LocationAbbr")) %>%
              mutate(data.cat = factor(
                      case_when(
                            Data_Value < 198 ~ "Q1 (166 to < 198)",
                            Data_Value >= 198 & Data_Value < 215 ~ "Q2 (198 to < 215)",
                            Data_Value >= 215 & Data_Value < 248 ~ "Q3 (215 to < 248)",
                            Data_Value >= 248 & Data_Value < 400 ~ "Q4 (248 to 326)",
                            is.na(Data_Value) ~ "Data Not Available"
                      ),
                      levels = c("Q1 (166 to < 198)", "Q2 (198 to < 215)",
                                 "Q3 (215 to < 248)", "Q4 (248 to 326)", "Data Not Available")
                  )
               )

table(cdc.cvd$data.cat)
#> 
#>  Q1 (166 to < 198)  Q2 (198 to < 215)  Q3 (215 to < 248)    Q4 (248 to 326) 
#>                 13                 13                 12                 13 
#> Data Not Available 
#>                  6
class(cdc.cvd$data.cat)
#> [1] "factor"

Mapping US with Territory Geometries

Using Census Insurance Data

Start by defining the fill category colors with their factor labels.

colors.census <- c("Less than 5%" = "#feebe2", 
                    "5% to <10%" = "#f768a1", 
                    "10% or Greater" = "#7a0177")

Then specify any required parameters of the function (see documentation for details).

map1_categorical(data = census.uninsured19, 
                 join_var = "STUSPS", 
                 fill_var = "Percent.Cat", 
                 fill_color = colors.census, 
                 legend_name = "Percent Uninsured",
                 territory_label_color = "black",
                 title = "Figure 1. Percent Uninsured, Ages <19 Years",
                 save.filepath = "saved-maps/map1-uninsure.png")

Let’s say we wanted to add a border to highlight specific states or territories. We’ll first define a vector of US postal service IDs (in this example, Oregon, Wisconsin, Virginia, and USVI) and then feed this into the border_ids parameter.

border <- c("OR", "WI", "VA", "VI")

map1_categorical(data = census.uninsured19, 
                 join_var = "STUSPS", 
                 fill_var = "Percent.Cat", 
                 fill_color = colors.census, 
                 legend_name = "Percent Uninsured",
                 title = "Figure 1. Percent Uninsured, Ages <19 Years",
                 border_ids = border,
                 border_color = "red",
                 border_linewidth = 1,
                 save.filepath = "saved-maps/map1-uninsure2.png")

Using CDC Cardiovascular Disease Mortality Data

Sometimes we may want to remove the inset box outline, which we can do by specifying inset_box_color = "white".

We also highlight an additional option of removing the territory labels by specifying territory_label_color = "white".

colors.cdc <- c("Q1 (166 to < 198)" = "#ffffcc",
                 "Q2 (198 to < 215)" = "#a1dab4",
                 "Q3 (215 to < 248)" = "#41b6c4",
                 "Q4 (248 to 326)" = "#225ea8",
                 "Data Not Available" = "grey80")

map1_categorical(data = cdc.cvd, 
                 join_var = "state",
                 fill_var = "data.cat", 
                 fill_color = colors.cdc, 
                 fill_linewidth = 1.2,
                 fill_linecolor = "black",
                 inset_box_color = "white",
                 territory_label_color = "white",
                 legend_name = "CVD Mortality Rate\nper 100,000 persons",
                 border_ids = border,
                 border_color = "red",
                 border_linewidth = 1.5,
                 save.filepath = "saved-maps/map1-cvd.png") 

Mapping US with Territory Labels

We love maps of the territory geometries, but you might also want a map with the territory labels.

colors.census <- c("Less than 5%" = "#feebe2", 
                    "5% to <10%" = "#f768a1", 
                    "10% or Greater" = "#7a0177")

border <- c("OR", "WI", "VA", "VI")

map2_categorical(data = census.uninsured19, 
                 join_var = "STUSPS", 
                 fill_var = "Percent.Cat", 
                 fill_color = colors.census, 
                 legend_name = "Percent Uninsured",
                 title = "Figure 1. Percent Uninsured, Ages <19 Years",
                 border_ids = border,
                 border_color = "red",
                 border_linewidth = 1,
                 save.filepath = "saved-maps/map2-uninsure.png")

Note that in the current package version, territory labels cannot be highlighted with a border, even when specified in the border ID vector.

colors.cdc <- c("Q1 (166 to < 198)" = "#ffffcc",
                 "Q2 (198 to < 215)" = "#a1dab4",
                 "Q3 (215 to < 248)" = "#41b6c4",
                 "Q4 (248 to 326)" = "#225ea8",
                 "Data Not Available" = "grey80")

map2_categorical(data = cdc.cvd, 
                 join_var = "state",
                 fill_var = "data.cat", 
                 fill_color = colors.cdc, 
                 fill_linewidth = 1.2,
                 fill_linecolor = "black",
                 inset_box_color = "white",
                 legend_name = "CVD Mortality Rate\nper 100,000 persons",
                 border_ids = border,
                 border_color = "red",
                 border_linewidth = 1.5,
                 save.filepath = "saved-maps/map2-cvd.png") 

County-Level Mapping

Prepare Data

To map at the county-level, you will need to have a data frame with two columns:

  1. a categorical variable coded as a factor
  2. the five number county FIPS code (commonly labeled as GEOID) coded as a factor. The tidycensus package has a useful list that can be loaded and joined to the data frame as needed to facilitate #2.
fips_county <- tidycensus::fips_codes
head(fips_county)
#>   state state_code state_name county_code         county
#> 1    AL         01    Alabama         001 Autauga County
#> 2    AL         01    Alabama         003 Baldwin County
#> 3    AL         01    Alabama         005 Barbour County
#> 4    AL         01    Alabama         007    Bibb County
#> 5    AL         01    Alabama         009  Blount County
#> 6    AL         01    Alabama         011 Bullock County

For this vignette, we’ll be using one of the provided dataset census.uninsured19.co. census.uninsured19.co provides an example of a dataset with complete data for all US state and territory counties and county-equivalent units. It is already in the proper format for the provided package functions.

County-level data preparation note: If your dataset does not have complete data for all US state and territory counties, see the data preparation steps exemplified in the state and territory-level data preparation example above (cdc.cvd data). The same steps apply, but using the fips_county data for the join to obtain all county-level geometry ID’s.

data(census.uninsured19.co)

We can see that census.uninsured19.co has these two components: 1. “Percent.Cat”: the Percentage Ages 19 or Under with No Health Insurance categorized as a factor 2. “GEOID”: the five number county FIPS code, coded as a factor.

class(census.uninsured19.co$Percent.Cat)
#> [1] "factor"
table(census.uninsured19.co$Percent.Cat)
#> 
#>   Less than 5%     5% to <10% 10% or Greater 
#>           1715            993            524
head(census.uninsured19.co$GEOID)
#> [1] "01001" "01003" "01005" "01007" "01009" "01011"

Mapping US with Territory Geometries at the County (equivalent) Level

Using Census Insurance Data

Start by defining the fill category colors with their factor labels.

colors.census <- c("Less than 5%" = "#feebe2", 
                    "5% to <10%" = "#f768a1", 
                    "10% or Greater" = "#7a0177")

Then specify any required parameters of the function (see documentation for details).

Note that the default option for county geometry data year (county_data_year) is “2020” which provides the 2020 county geometry. Setting county_data_year = “2010” can be used to map using the 2010 county geometry file.

map1_categorical_county(data = census.uninsured19.co, 
                        join_var = "GEOID",
                        county_data_year = "2020",  
                        fill_var = "Percent.Cat", 
                        fill_color = colors.census, 
                        fill_linewidth = 0.5, 
                        fill_linecolor = "gray50",
                        legend_name = "Percent Uninsured",
                        title = "Figure 1. Percent Uninsured, Ages <19 Years",
                        state_color = "black", 
                        state_linewidth = 1,
                        save.filepath = "saved-maps/map1-uninsure-co.png")